ENT MERFISH Report

1. Overview

1.1 Sample Information

A brief sample information is generated from the submission table for the following analysis.

Sample Index and Basic Information
Expt Sample Index Case_Number Sex Age HPV Sample_Region Genotype Group Region DataPath Clinic.comment
1 NT28530x2 NT28530 1 M 52 Negative Left posterior BOT NT Normal region_NT28530x2 Y:_Imaging_data_2\202409271426_20240927ENT28519ImmunOnc500VA067x01_VMSC00101_234 Left posterior and BOT SCC. 15x years ago treated with chemorad for left tonsil K.
1 TT28519x1 TT28519 1 M 52 Negative Left posterior BOT TT Tumor region_TT28519x1 Y:_Imaging_data_2\202409271426_20240927ENT28519ImmunOnc500VA067x01_VMSC00101_234 Left posterior and BOT SCC. 15x years ago treated with chemorad for left tonsil K.
1 TT28519x2 TT28519_dup 1 M 52 Negative Left posterior BOT TT Tumor region_TT28519x2 Y:_Imaging_data_2\202409271426_20240927ENT28519ImmunOnc500VA067x01_VMSC00101_234 Left posterior and BOT SCC. 15x years ago treated with chemorad for left tonsil K.
1 NT28530x1 NT28530_dup 1 M 52 Negative Left posterior BOT NT Normal region_NT28530x1 Y:_Imaging_data_2\202409271426_20240927ENT28519ImmunOnc500VA067x01_VMSC00101_234 Left posterior and BOT SCC. 15x years ago treated with chemorad for left tonsil K.

1.2 MERSCOPE Data Quality Summary

The summaries present the data quality assessment automatically generated by MERSCOPE for each experiment. We mainly focus on the transcripts level for each sample. So we’re looking for high density in transcripts, based on the transcripts count per field of view (FOV), transcript density in FOV, and frequency of transcripts detected.

Generally, log10 transcript count > 4.0 in most area can be considered as a good quality standard for human tissue.

Need to note that the low accuracy in DAPI cell boundary is not a concern, as a self-designed cell segmentation processing will take over this task.

1.2.1 NT28530x2(Normal)

1.2.2 TT28519x1(Tumor)

1.2.3 TT28519x2(Tumor)

1.2.4 NT28530x1(Normal)

1.3 Transcript Mis-Match on FOV Boundary

Due to issues with the Vizgen software, transcript alignment is incorrect during decoding, leading to misalignment problems at the field of view (FOV) boundary.

Vizgen has solved this issue by temporarily updating the decoding software to an unpublished beta version.

However, this update also made the “vzg” file being upgraded to “vzg2”. Current Vizgen Visualizer cannot recognize the new data format, which made Visualizer unusable for this dataset.

2. Data Processing & Analysis

2.0 Autofluorescent issue

Autofluorescence is a major problem limiting the sensitivity of the detection of the fluorescence specifically derived from the applied dye or probe. We can find it on DAPI imaging.

As Dr. Tan highly suggested, we removed the portion of cell with autofluorescent issue

NT28530_DAPI_Imaging

NT28530_DAPI_Imaging

NT28530_DAPI_Imaging_small

NT28530_DAPI_Imaging_small

NT28530_dup_DAPI_Imaging

NT28530_dup_DAPI_Imaging

NT28530_dup_DAPI_Imaging_small

NT28530_dup_DAPI_Imaging_small

2.1 Cell Segmentation & Filtering

Based on the spatial information and images obtained from MERFISH, we developed a machine learning model using the Cellpose algorithm to distinguish individual cells via MERFISH DAPI images.

To ensure the data quality and accuracy of cells, we have defined the minimum and maximum values for cell volume and gene count per cell. The cell volume should be between [100, 2500], and the gene count per cell > 25. After filter the outliers, the qualified cells count is shown in the following table.

Outliers were filtered from the data, and the qualified cell count is presented below. The transcript count Violin and transcript count Spatial Map are displayed here as part of the quality control reveal.

2.1.2 Transcript Count Violin

Transcript Count Violin After Filtering

Transcript Count Violin After Filtering

2.2 Batch Effect & Dimension Reduction

We use Scanpy for the analysis of single-cell level transcriptome data. The initial stage of our analysis involves the elimination of batch effects, thereby ensuring that different samples from various batches are distributed within the same domain and are statistically reasonable to be integrated and compared. To achieve this, we utilize the Harmony algorithm.

Subsequently, we present visualizations of the batch difference by Leiden UMAP clusters. Also, we illustrate the distributions of the Leiden clusters for future analysis.

Umap of cells and colored by batch

Umap of cells and colored by batch

3.Cell Annotation

To annotate individual cell types, we found the marker genes from database CellMarker.

The marker we used is listed in the following dot plot.

3.1 Markers for type annotation

3.2 Cell Type Umap

3.3 Cell Type Spatial Map

3.4 Cancer & T cell & B cell Spatial Map

3.5 Cell Type Count Table

Cell Type Count
cell_type_2 NT28530 NT28530_dup TT28519 TT28519_dup Total
B cell 2 3 1229 1037 2271
Cancer cell 34 31 465 352 882
Dendritic 21 11 165 131 328
Endothelial 1620 1437 1798 1336 6191
Fibroblast 2052 1719 19701 18105 41577
Macrophage 290 261 1878 1643 4072
Mast cell 35 33 467 481 1016
Smooth muscle cell 369 285 1609 1278 3541
T cell 115 82 829 700 1726
Total 4538 3862 28141 25063 61604

3.6 Cell Type Proportion

4. Gene differentiation Volcano

Here, we use pseudo-bulk and DESeq2 to compute gene differential expression (DE). The statistical significance was cut-off by log2(Fold Change) > 2 or log2(Fold Change) < -2 and p_value < 0.05.

4.1 all

names scores logfoldchanges pvals pvals_adj
COL1A1 128.788570 85.608810 0.0000000 0.0000000
FN1 81.902664 7.640585 0.0000000 0.0000000
FOS -47.173332 -6.782708 0.0000000 0.0000000
PDK4 -60.806812 -6.651487 0.0000000 0.0000000
COL5A1 80.620630 6.270690 0.0000000 0.0000000
EGR1 -34.516680 -5.052951 0.0000000 0.0000000
VEGFA -39.852993 -5.038163 0.0000000 0.0000000
CD79A 6.056874 4.974900 0.0000000 0.0000000
JUN -66.772200 -4.850358 0.0000000 0.0000000
GPX3 -34.593320 -4.005765 0.0000000 0.0000000
MZB1 6.873029 3.996346 0.0000000 0.0000000
SERPINA1 11.274291 3.937342 0.0000000 0.0000000
FCRL5 4.538768 3.787383 0.0000057 0.0000216
CX3CL1 -8.829142 -3.667626 0.0000000 0.0000000
POU2AF1 18.039322 3.628723 0.0000000 0.0000000
FAP 6.178362 3.500000 0.0000000 0.0000000
TNC 32.861107 3.404297 0.0000000 0.0000000
DES -31.845634 -3.285579 0.0000000 0.0000000
TNF 9.116647 3.140257 0.0000000 0.0000000
MYH11 -5.366857 -3.078943 0.0000001 0.0000004
DUSP1 -52.275627 -3.074179 0.0000000 0.0000000
XBP1 6.990413 3.069195 0.0000000 0.0000000
MMP9 11.380424 3.024176 0.0000000 0.0000000
RORC -14.301700 -2.973669 0.0000000 0.0000000
MMP11 14.477017 2.849452 0.0000000 0.0000000
BMP1 54.894270 2.836429 0.0000000 0.0000000
COL11A1 25.263824 2.801148 0.0000000 0.0000000
CCR2 5.294561 2.787528 0.0000001 0.0000005
CR2 4.698620 2.662321 0.0000026 0.0000105
TP63 -4.826907 -2.622918 0.0000014 0.0000057
CXCL1 6.459829 2.545726 0.0000000 0.0000000
PPARGC1A -4.270235 -2.514703 0.0000195 0.0000729
JUNB -35.778816 -2.490123 0.0000000 0.0000000
DERL3 3.347758 2.465135 0.0008147 0.0025145
ITGAX 5.379549 2.439574 0.0000001 0.0000003
LGR6 -2.545086 -2.426296 0.0109251 0.0278701
LGR5 -5.407744 -2.333752 0.0000001 0.0000003
CD19 2.867665 2.272296 0.0041351 0.0114865
MUC1 9.975843 2.272225 0.0000000 0.0000000
HGF 4.893160 2.267532 0.0000010 0.0000041
IRF4 3.042322 2.264538 0.0023476 0.0068244
FLI1 5.181536 2.245511 0.0000002 0.0000010
NCAM1 -16.074312 -2.220456 0.0000000 0.0000000
CLCA1 3.924798 2.207233 0.0000868 0.0003078
PIK3CG 6.272762 2.169015 0.0000000 0.0000000
CD248 30.148150 2.130536 0.0000000 0.0000000
ESCO2 2.794248 2.117717 0.0052021 0.0140596
TNFRSF13C 2.790883 2.108861 0.0052564 0.0141302
CD27 2.874682 2.099617 0.0040444 0.0113278
LOX 4.648842 2.094066 0.0000033 0.0000131
CHEK2 7.132932 2.084164 0.0000000 0.0000000
TNFRSF9 3.886243 2.083410 0.0001018 0.0003560
SPRY2 -21.722990 -2.031361 0.0000000 0.0000000
SH2D1B -3.091661 -2.020710 0.0019904 0.0058199
IL6R -19.963484 -2.006882 0.0000000 0.0000000

4.2 Cancer cell

names scores logfoldchanges pvals pvals_adj
FOS -9.785631 -18.199657 0.0000000 0.0000000
COL1A1 10.269500 14.999704 0.0000000 0.0000000
ATF3 -6.626699 -7.378474 0.0000000 0.0000000
JUNB -6.318369 -6.213871 0.0000000 0.0000000
EGR1 -6.256653 -4.785147 0.0000000 0.0000000
JUN -6.509843 -4.727665 0.0000000 0.0000000
DUSP1 -5.453072 -4.588524 0.0000000 0.0000031
FLT4 3.745493 4.222456 0.0001800 0.0056262
COL5A1 3.386069 3.657629 0.0007090 0.0186583
LMNA -3.628383 -3.537342 0.0002852 0.0079223
FN1 4.397565 3.367886 0.0000109 0.0004561
COL4A1 5.778854 3.235475 0.0000000 0.0000005
THBD -4.540221 -2.777444 0.0000056 0.0002554
PROX1 3.851474 2.765926 0.0001174 0.0039136
MYC -4.682372 -2.680914 0.0000028 0.0001418
EPHB4 5.252239 2.557996 0.0000002 0.0000083
ETS1 3.923814 2.085533 0.0000872 0.0031128
PKM 3.999442 2.051902 0.0000635 0.0024420

4.3 Dendritic

names scores logfoldchanges pvals pvals_adj
COL1A1 5.566146 13.829243 0.0000000 0.0000022
FOS -7.104612 -12.344484 0.0000000 0.0000000
HLA-DPB1 -6.568896 -9.209349 0.0000000 0.0000000
DUSP1 -5.928196 -5.547440 0.0000000 0.0000004
CD79A 3.075951 5.511458 0.0020983 0.0499601
TNFRSF13C 3.386980 5.160501 0.0007067 0.0235555
LMNA -6.268660 -5.043616 0.0000000 0.0000001
JUN -4.421127 -4.970809 0.0000098 0.0005786
CCR7 3.546910 4.765553 0.0003898 0.0149915
XCR1 -3.406603 -3.705974 0.0006578 0.0234917
CIITA -4.408372 -3.682036 0.0000104 0.0005786
ITGB2 -5.832042 -3.602481 0.0000000 0.0000005
EGR1 -3.082819 -3.388733 0.0020505 0.0499601
CD83 -3.119122 -2.797472 0.0018139 0.0477344
TGFBI -3.131877 -2.772608 0.0017369 0.0477344
HLA-DMA -4.489809 -2.684866 0.0000071 0.0005092
HAVCR2 -3.818692 -2.534944 0.0001342 0.0067080
HLA-DRB1 -3.356564 -2.348675 0.0007892 0.0246617
CTNNB1 -3.622459 -2.221795 0.0002918 0.0121590

4.4 Endothelial

names scores logfoldchanges pvals pvals_adj
COL1A1 48.494420 10.562972 0.0000000 0.0000000
VEGFA -35.800650 -10.077240 0.0000000 0.0000000
PLVAP 38.553684 9.591330 0.0000000 0.0000000
PDK4 -35.000160 -6.994374 0.0000000 0.0000000
COL4A1 37.052235 5.925527 0.0000000 0.0000000
PECAM1 27.910667 4.548041 0.0000000 0.0000000
SERPINE1 21.524950 4.273134 0.0000000 0.0000000
COL5A1 17.402040 4.073469 0.0000000 0.0000000
CD248 6.693452 3.390605 0.0000000 0.0000000
KIT 3.707403 3.382501 0.0002094 0.0010470
RORC -15.018403 -3.217684 0.0000000 0.0000000
MMP11 4.147833 3.211941 0.0000336 0.0001865
LMNA 29.895050 3.139546 0.0000000 0.0000000
CXCL2 -4.230872 -3.076571 0.0000233 0.0001326
PDGFRB 13.199806 3.033822 0.0000000 0.0000000
ACTA2 11.822330 2.952414 0.0000000 0.0000000
CD276 9.099426 2.858697 0.0000000 0.0000000
FN1 22.890614 2.811405 0.0000000 0.0000000
IL6R -14.837193 -2.797295 0.0000000 0.0000000
NEDD4 -14.296750 -2.759839 0.0000000 0.0000000
MMRN1 6.964719 2.737173 0.0000000 0.0000000
PPARGC1A -4.957076 -2.626450 0.0000007 0.0000047
TGFBR2 23.980795 2.584158 0.0000000 0.0000000
TGFB1 14.840543 2.512173 0.0000000 0.0000000
SH2D1B -3.438163 -2.469765 0.0005857 0.0027368
ETS1 20.773193 2.395918 0.0000000 0.0000000
ENG 20.871770 2.392597 0.0000000 0.0000000
CTSW 4.237784 2.299571 0.0000226 0.0001312
CLEC14A 21.314291 2.280237 0.0000000 0.0000000
TP63 -4.694613 -2.275646 0.0000027 0.0000165
E2F1 3.327995 2.223573 0.0008747 0.0040125
WWTR1 20.321642 2.218000 0.0000000 0.0000000
ITGA5 19.410233 2.066686 0.0000000 0.0000000
VWF 20.537136 2.056513 0.0000000 0.0000000
ADAMTS4 5.038486 2.025444 0.0000005 0.0000032
CD40 12.332045 2.025083 0.0000000 0.0000000
PGF 9.986552 2.019348 0.0000000 0.0000000
BAX 7.924216 2.011469 0.0000000 0.0000000
SELP 8.081866 2.004470 0.0000000 0.0000000

4.5 Fibroblast

names scores logfoldchanges pvals pvals_adj
COL1A1 99.584496 109.818320 0.0000000 0.0000000
FOS -47.300650 -10.960250 0.0000000 0.0000000
EGR1 -50.034084 -10.444641 0.0000000 0.0000000
FN1 58.606388 8.496051 0.0000000 0.0000000
JUN -63.576283 -7.565968 0.0000000 0.0000000
COL5A1 58.923054 6.571885 0.0000000 0.0000000
PDK4 -36.471450 -6.122548 0.0000000 0.0000000
GPX3 -34.555553 -6.022972 0.0000000 0.0000000
SERPINA1 9.788640 4.442682 0.0000000 0.0000000
CCR2 4.498826 3.797222 0.0000068 0.0000322
TNC 28.302124 3.778490 0.0000000 0.0000000
RET -3.031081 -3.755012 0.0024368 0.0087655
DUSP1 -42.773705 -3.725364 0.0000000 0.0000000
FLI1 4.944228 3.520309 0.0000008 0.0000039
SPRY2 -28.371538 -3.458082 0.0000000 0.0000000
BCL2 -21.155022 -3.385022 0.0000000 0.0000000
IL6R -11.101422 -3.365470 0.0000000 0.0000000
FAP 5.212951 3.358611 0.0000002 0.0000010
TNF 7.009517 3.302482 0.0000000 0.0000000
HLA-DPA1 2.531975 3.211472 0.0113422 0.0361217
TGFBI 15.144619 3.160163 0.0000000 0.0000000
KLF2 -22.572262 -3.158508 0.0000000 0.0000000
PREX2 -9.125145 -3.149026 0.0000000 0.0000000
PLA2G2A -10.425041 -2.906744 0.0000000 0.0000000
CR2 4.179469 2.903391 0.0000292 0.0001353
EPHA2 -5.967859 -2.879558 0.0000000 0.0000000
BMP1 42.946106 2.820837 0.0000000 0.0000000
EGFR -36.315860 -2.800635 0.0000000 0.0000000
LGR5 -5.342933 -2.777953 0.0000001 0.0000005
TEAD4 4.608928 2.773391 0.0000040 0.0000195
MMP11 10.750896 2.738437 0.0000000 0.0000000
PGF -6.039967 -2.737986 0.0000000 0.0000000
SOD2 -21.327894 -2.712913 0.0000000 0.0000000
CXCL1 5.300285 2.677718 0.0000001 0.0000006
NFKBIA -37.255253 -2.649212 0.0000000 0.0000000
S100A9 -3.752651 -2.575648 0.0001750 0.0007352
HGF 3.892633 2.475008 0.0000992 0.0004311
FZD7 -21.778078 -2.405200 0.0000000 0.0000000
CLCA1 3.290242 2.399776 0.0010010 0.0039410
JUNB -20.440767 -2.370539 0.0000000 0.0000000
CDK6 12.447824 2.350997 0.0000000 0.0000000
POU2AF1 10.437716 2.289855 0.0000000 0.0000000
CXCL2 7.364562 2.266815 0.0000000 0.0000000
CD1C 2.796931 2.261334 0.0051591 0.0173123
COL11A1 19.538797 2.241301 0.0000000 0.0000000
MET -2.548490 -2.225056 0.0108190 0.0346764
MMP9 8.461944 2.163420 0.0000000 0.0000000
ZAP70 7.091669 2.159623 0.0000000 0.0000000
COL4A1 -14.446335 -2.158494 0.0000000 0.0000000
PDGFRA -24.129530 -2.103215 0.0000000 0.0000000
MUC1 8.427344 2.101945 0.0000000 0.0000000
LOX 3.689835 2.072878 0.0002244 0.0009122
IL1R2 -3.213715 -2.024993 0.0013103 0.0049632

4.6 Macrophage

names scores logfoldchanges pvals pvals_adj
COL1A1 29.455763 21.839626 0.00e+00 0.0000000
FOS -22.721663 -12.711040 0.00e+00 0.0000000
MMP9 5.282391 6.655558 1.00e-07 0.0000029
SPP1 4.523012 6.098854 6.10e-06 0.0001089
FN1 17.127865 5.720485 0.00e+00 0.0000000
EGR1 -17.871151 -5.108791 0.00e+00 0.0000000
ITGAX 13.412282 4.330986 0.00e+00 0.0000000
TREM2 5.936416 4.138564 0.00e+00 0.0000001
PDK4 -15.192719 -4.066201 0.00e+00 0.0000000
MRC1 -18.314564 -3.879636 0.00e+00 0.0000000
COL5A1 9.964282 3.432761 0.00e+00 0.0000000
PKM 12.258259 2.780718 0.00e+00 0.0000000
DUSP1 -13.226417 -2.744173 0.00e+00 0.0000000
CD248 3.817119 2.670214 1.35e-04 0.0015700
HIF1A 6.205948 2.510126 0.00e+00 0.0000000
THBD -8.628058 -2.480686 0.00e+00 0.0000000
LYZ 4.869451 2.068499 1.10e-06 0.0000224
CD276 3.933034 2.063579 8.39e-05 0.0011983
LRP1 11.630071 2.041809 0.00e+00 0.0000000

4.7 Mast cell

names scores logfoldchanges pvals pvals_adj
COL1A1 11.535374 29.882832 0.0000000 0.0000000
FN1 3.918284 4.592460 0.0000892 0.0074318
VEGFA -6.780468 -4.312647 0.0000000 0.0000000
COL5A1 3.680838 4.157949 0.0002325 0.0166049
SFRP2 3.600834 4.012765 0.0003172 0.0198249
NFKB2 -6.740680 -3.857994 0.0000000 0.0000000
ICAM1 -4.802181 -2.619479 0.0000016 0.0001962
CSF1 3.980105 2.535041 0.0000689 0.0068885

4.8 Smooth Muscle cell

names scores logfoldchanges pvals pvals_adj
COL1A1 35.726482 36.785250 0.0000000 0.0000000
COL4A1 26.232376 6.762253 0.0000000 0.0000000
FOS -13.716193 -6.434487 0.0000000 0.0000000
JUN -15.467052 -5.318596 0.0000000 0.0000000
FN1 19.043512 5.159483 0.0000000 0.0000000
TNF 4.654094 5.056424 0.0000033 0.0000440
COL5A1 21.388361 4.892525 0.0000000 0.0000000
JUNB -17.099833 -4.477895 0.0000000 0.0000000
ACTA2 -13.841452 -4.379541 0.0000000 0.0000000
LGR6 -5.337485 -4.143942 0.0000001 0.0000014
MYH11 -15.262327 -4.033347 0.0000000 0.0000000
PDGFRB 11.387992 3.660574 0.0000000 0.0000000
CHEK2 2.877333 3.354025 0.0040105 0.0308502
PDK4 -9.240222 -3.318832 0.0000000 0.0000000
PLVAP 5.894432 3.280383 0.0000000 0.0000001
SERPINE1 9.799668 3.096053 0.0000000 0.0000000
DUSP1 -14.130303 -3.035766 0.0000000 0.0000000
E2F1 3.320798 2.862792 0.0008976 0.0084680
MMP11 7.125006 2.760733 0.0000000 0.0000000
ATF3 -10.429241 -2.656524 0.0000000 0.0000000
BMP1 14.985505 2.525168 0.0000000 0.0000000
CX3CL1 -3.396368 -2.494866 0.0006829 0.0065660
ICAM3 4.387672 2.447743 0.0000115 0.0001364
DES -6.487512 -2.376898 0.0000000 0.0000000
SFRP2 5.420849 2.368788 0.0000001 0.0000009
CD248 12.927052 2.368017 0.0000000 0.0000000
MYBL2 3.005535 2.241821 0.0026511 0.0228547

4.9 T cell

names scores logfoldchanges pvals pvals_adj
COL1A1 18.151093 33.722046 0.00e+00 0.0000000
FN1 8.314957 5.708059 0.00e+00 0.0000000
CD40LG -5.232654 -2.784972 2.00e-07 0.0000278
FOS -3.948944 -2.107010 7.85e-05 0.0039248

5. Supplements

For original pictures and more details, please use the google drive link: ENT MERFISH Supplement